Goto

Collaborating Authors

 Skagerrak


DipLLM: Fine-Tuning LLM for Strategic Decision-making in Diplomacy

Xu, Kaixuan, Chai, Jiajun, Li, Sicheng, Fu, Yuqian, Zhu, Yuanheng, Zhao, Dongbin

arXiv.org Artificial Intelligence

Diplomacy is a complex multiplayer game that requires both cooperation and competition, posing significant challenges for AI systems. Traditional methods rely on equilibrium search to generate extensive game data for training, which demands substantial computational resources. Large Language Models (LLMs) offer a promising alternative, leveraging pre-trained knowledge to achieve strong performance with relatively small-scale fine-tuning. However, applying LLMs to Diplomacy remains challenging due to the exponential growth of possible action combinations and the intricate strategic interactions among players. To address this challenge, we propose DipLLM, a fine-tuned LLM-based agent that learns equilibrium policies for Diplomacy. DipLLM employs an autoregressive factorization framework to simplify the complex task of multi-unit action assignment into a sequence of unit-level decisions. By defining an equilibrium policy within this framework as the learning objective, we fine-tune the model using only 1.5% of the data required by the state-of-the-art Cicero model, surpassing its performance. Our results demonstrate the potential of fine-tuned LLMs for tackling complex strategic decision-making in multiplayer games.


More Victories, Less Cooperation: Assessing Cicero's Diplomacy Play

Wongkamjan, Wichayaporn, Gu, Feng, Wang, Yanze, Hermjakob, Ulf, May, Jonathan, Stewart, Brandon M., Kummerfeld, Jonathan K., Peskoff, Denis, Boyd-Graber, Jordan Lee

arXiv.org Artificial Intelligence

The boardgame Diplomacy is a challenging setting for communicative and cooperative artificial intelligence. The most prominent communicative Diplomacy AI, Cicero, has excellent strategic abilities, exceeding human players. However, the best Diplomacy players master communication, not just tactics, which is why the game has received attention as an AI challenge. This work seeks to understand the degree to which Cicero succeeds at communication. First, we annotate in-game communication with abstract meaning representation to separate in-game tactics from general language. Second, we run two dozen games with humans and Cicero, totaling over 200 human-player hours of competition. While AI can consistently outplay human players, AI-Human communication is still limited because of AI's difficulty with deception and persuasion. This shows that Cicero relies on strategy and has not yet reached the full promise of communicative and cooperative AI.


Welfare Diplomacy: Benchmarking Language Model Cooperation

Mukobi, Gabriel, Erlebach, Hannah, Lauffer, Niklas, Hammond, Lewis, Chan, Alan, Clifton, Jesse

arXiv.org Artificial Intelligence

The growing capabilities and increasingly widespread deployment of AI systems necessitate robust benchmarks for measuring their cooperative capabilities. Unfortunately, most multi-agent benchmarks are either zero-sum or purely cooperative, providing limited opportunities for such measurements. We introduce a general-sum variant of the zero-sum board game Diplomacy -- called Welfare Diplomacy -- in which players must balance investing in military conquest and domestic welfare. We argue that Welfare Diplomacy facilitates both a clearer assessment of and stronger training incentives for cooperative capabilities. Our contributions are: (1) proposing the Welfare Diplomacy rules and implementing them via an open-source Diplomacy engine; (2) constructing baseline agents using zero-shot prompted language models; and (3) conducting experiments where we find that baselines using state-of-the-art models attain high social welfare but are exploitable. Our work aims to promote societal safety by aiding researchers in developing and assessing multi-agent AI systems. Code to evaluate Welfare Diplomacy and reproduce our experiments is available at https://github.com/mukobi/welfare-diplomacy.


Revealing interactions between HVDC cross-area flows and frequency stability with explainable AI

Pütz, Sebastian, Schäfer, Benjamin, Witthaut, Dirk, Kruse, Johannes

arXiv.org Artificial Intelligence

The energy transition introduces more volatile energy sources into the power grids. In this context, power transfer between different synchronous areas through High Voltage Direct Current (HVDC) links becomes increasingly important. Such links can balance volatile generation by enabling long-distance transport or by leveraging their fast control behavior. Here, we investigate the interaction of power imbalances - represented through the power grid frequency - and power flows on HVDC links between synchronous areas in Europe. We use explainable machine learning to identify key dependencies and disentangle the interaction of critical features. Our results show that market-based HVDC flows introduce deterministic frequency deviations, which however can be mitigated through strict ramping limits. Moreover, varying HVDC operation modes strongly affect the interaction with the grid. In particular, we show that load-frequency control via HVDC links can both have control-like or disturbance-like impacts on frequency stability.


Statistically Guided Divide-and-Conquer for Sparse Factorization of Large Matrix

Chen, Kun, Dong, Ruipeng, Xu, Wanwan, Zheng, Zemin

arXiv.org Machine Learning

The sparse factorization of a large matrix is fundamental in modern statistical learning. In particular, the sparse singular value decomposition and its variants have been utilized in multivariate regression, factor analysis, biclustering, vector time series modeling, among others. The appeal of this factorization is owing to its power in discovering a highly-interpretable latent association network, either between samples and variables or between responses and predictors. However, many existing methods are either ad hoc without a general performance guarantee, or are computationally intensive, rendering them unsuitable for large-scale studies. We formulate the statistical problem as a sparse factor regression and tackle it with a divide-and-conquer approach. In the first stage of division, we consider both sequential and parallel approaches for simplifying the task into a set of co-sparse unit-rank estimation (CURE) problems, and establish the statistical underpinnings of these commonly-adopted and yet poorly understood deflation methods. In the second stage of division, we innovate a contended stagewise learning technique, consisting of a sequence of simple incremental updates, to efficiently trace out the whole solution paths of CURE. Our algorithm has a much lower computational complexity than alternating convex search, and the choice of the step size enables a flexible and principled tradeoff between statistical accuracy and computational efficiency. Our work is among the first to enable stagewise learning for non-convex problems, and the idea can be applicable in many multi-convex problems. Extensive simulation studies and an application in genetics demonstrate the effectiveness and scalability of our approach.


Short-Term Forecasting of CO2 Emission Intensity in Power Grids by Machine Learning

Leerbeck, Kenneth, Bacher, Peder, Junker, Rune, Goranović, Goran, Corradi, Olivier, Ebrahimy, Razgar, Tveit, Anna, Madsen, Henrik

arXiv.org Machine Learning

A machine learning algorithm is developed to forecast the CO2 emission intensities in electrical power grids in the Danish bidding zone DK2, distinguishing between average and marginal emissions. The analysis was done on data set comprised of a large number (473) of explanatory variables such as power production, demand, import, weather conditions etc. collected from selected neighboring zones. The number was reduced to less than 50 using both LASSO (a penalized linear regression analysis) and a forward feature selection algorithm. Three linear regression models that capture different aspects of the data (non-linearities and coupling of variables etc.) were created and combined into a final model using Softmax weighted average. Cross-validation is performed for debiasing and autoregressive moving average model (ARIMA) implemented to correct the residuals, making the final model the variant with exogenous inputs (ARIMAX). The forecasts with the corresponding uncertainties are given for two time horizons, below and above six hours. Marginal emissions came up independent of any conditions in the DK2 zone, suggesting that the marginal generators are located in the neighbouring zones. The developed methodology can be applied to any bidding zone in the European electricity network without requiring detailed knowledge about the zone.


Assessing the performance of statistical classifiers to discriminate fish stocks using Fourier analysis of otolith shape - Canadian Journal of Fisheries and Aquatic Sciences

#artificialintelligence

The assignment of individual fish to its stock of origin is important for reliable stock assessment and fisheries management. Otolith shape is commonly used as the marker of distinct stocks in discrimination studies. Our literature review showed that the application and comparison of alternative statistical classifiers to discriminate fish stocks based on otolith shape is limited. Therefore, we compared the performance of two traditional and four machine learning classifiers based on Fourier analysis of otolith shape using selected stocks of Atlantic cod (Gadus morhua) in the southern Baltic and Atlantic herring (Clupea harengus) in the western Norwegian Sea, Skagerrak and the southern Baltic Sea. Our results showed that the stocks can be successfully discriminated based on their otolith shapes. We observed significant differences in the accuracy obtained by the tested classifiers.


SOFAR: large-scale association network learning

Uematsu, Yoshimasa, Fan, Yingying, Chen, Kun, Lv, Jinchi, Lin, Wei

arXiv.org Machine Learning

Many modern big data applications feature large scale in both numbers of responses and predictors. Better statistical efficiency and scientific insights can be enabled by understanding the large-scale response-predictor association network structures via layers of sparse latent factors ranked by importance. Yet sparsity and orthogonality have been two largely incompatible goals. To accommodate both features, in this paper we suggest the method of sparse orthogonal factor regression (SOFAR) via the sparse singular value decomposition with orthogonality constrained optimization to learn the underlying association networks, with broad applications to both unsupervised and supervised learning tasks such as biclustering with sparse singular value decomposition, sparse principal component analysis, sparse factor analysis, and spare vector autoregression analysis. Exploiting the framework of convexity-assisted nonconvex optimization, we derive nonasymptotic error bounds for the suggested procedure characterizing the theoretical advantages. The statistical guarantees are powered by an efficient SOFAR algorithm with convergence property. Both computational and theoretical advantages of our procedure are demonstrated with several simulation and real data examples.